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The results of a statistical analysis of telephone noise are presented. 
The analysis consists of two stages: an exploratory data analysis stage, 
where the data are characterized through various nonparametric statistics 
and a model-building stage, where the data are matched to models. 

The exploratory data analysis stage involved examination of noise 
waveforms, power spectra, and covariance estimates. The results show that 
telephone noise consists of a deterministic component (sinusoids at various 
frequencies) and a stochastic component. It is assumed that these com- 
ponents add. The data are filtered to remove the deterministic component. 
Next, central moment estimates are presented, as well as first-order ampli- 
tude statistics (histograms and empirical cumulative distributions) for 
these filtered data. The results indicate that the filtered data appear wide- 
sense stationary over short periods of time (typically 1 second) and, 
although close to gaussian, are distinctly nongaussian. 

The model-building stage involved fitting the filtered data to two classes 
of models. The first class of models is based on symmetric stable distribu- 
tions that arise from the central limit theorem. The second class of models 
assumes two different physical processes that contribute to the random 
component of telephone noise: The low-variance process is assumed to be 
gaussian, while the high-variance component is assumed to be a filtered 
Poisson process. Both classes of models agree intuitively with the physical 
processes generating telephone noise and are mathematically tractable. 
Based largely on graphical tests, both models appear to fit the filtered data 
reasonably well. 

I. INTRODUCTION 

Noise on telephone lines has puzzled and plagued people since the 
invention of the telephone. While it is common knowledge that tele- 
phone channel noise is nongaussian, nowhere in the literature is there 
a clear account of an adequate statistical characterization of telephone 
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noise. In part, this is due to the fact that only recently have statistical 
tools been developed that are equal to the task. 

This paper attempts to adequately characterize some statistical 
properties of telephone channel noise by means of various nonparamet- 
ric statistics and by mathematical models. It is encouraging to note 
that the results presented here do not contradict those found in earlier 
works. However, since only a small number of telephone line noise 
sample functions were examined, the results must be regarded as 
tentative, awaiting independent checks by other investigators. It is 
hoped the results presented here will stimulate communication theorists 
to investigate new methods for optimally processing signals corrupted 
by the nongaussian noise models presented here. Work along these 
lines might lead to optimum and practical suboptimum receiver 
structures for combating telephone noise. This in turn might permit 
greater insight into how noise limits telephone channel performance 
with regard to voice communication or data transmission. 

The authors have tried to keep the notation and nomenclature con- 
sistent with that used in statistics and probability theory. The reader 
is reminded, for example, that "empirical cumulative distribution 
function" refers to an estimate of the true "cumulative distribution 
function" based on observations of "empirical" data. The words 
"sample" and "empirical" are often used interchangeably, as in 
"sample mean" and "empirical mean," as compared with the ensemble 
mean. 

1.1 Summary of past work 

Broadly speaking, previous investigators characterized telephone 
channel noise in two different ways, based on different ways of measur- 
ing the data and with different problems in mind. First, direct measure- 
ments of sample functions of telephone channel noise have been carried 
out 1-8 and mathematical models for the noise statistics have been 
constructed. Second, digital signals have been transmitted over tele- 
phone lines and the difference between the transmitted and received 
signals has been analyzed, providing an indirect measurement of 
telephone channel noise. 4-16 It is extremely difficult to correlate these 
two types of measurements. This paper is solely concerned with direct 
measurements of telephone channel noise sample functions. 

Both types of measurements indicated the nongaussian nature of the 
noise. The analog measurements suggested that telephone noise could 
be considered a mixture of a nongaussian random process with sinusoids 
at various frequencies. 1-3 The first-order amplitude statistics for the 
random process appeared to be adequately modeled by a Pareto 
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distribution. 2 Analysis of errors in digital signals transmitted over 
telephone lines revealed that errors cluster in time, an indirect measure 
that the noise cannot be adequately modeled as white and gaussian. 4 - 13 
Some causes of telephone noise are thermal noise in amplifiers, 1,17 
switches sparking on opening or closing contact, 1-3 ,17 ~ 19 lightning, 317 
electromagnetic crosstalk, 17 fading on microwave links and switching 
to guard bands, 317 echo suppressor turnaround, 17 disturbances because 
of maintenance, 17 power line harmonics as well as harmonics of all 
sinusoids used in the telephone switching system, 17 and noise generated 
at switch interfaces. 17 The main causes of telephone noise are felt to 
be due to thermal noise, switch arcing, and pickup of unwanted 
sinusoidal harmonics. 17 The main cause of error clustering in digital 
signals is felt to be due to the impulsive component of the noise, 
generated by switch arcing. 17 

1.2 Problem statement 

The problem is twofold : 

(i) To provide an adequate statistical characterization of telephone 

channel noise by means of various nonparametric statistics. 
(ii) To allow the data plus knowledge of the physical processes 
generating telephone noise to lead to a mathematically tractable 
class of models. 

1.3 Outline of the paper 

The data from five telephone lines and the processing necessary to 
convert the data into a form suitable for further analysis are described 
first. The analysis is broken down into two steps, an exploratory data 
analysis stage where the data are characterized through various non- 
parametric statistics and a model-building stage where the data are 
matched to models. 

The exploratory data analysis stage involved examination of noise 
waveforms, power spectra, and covariance estimates. The results show 
that the data consisted of a deterministic component (sinusoids at 
various frequencies) plus a stochastic component, which were assumed 
to be independent. The data were filtered to remove the sinusoids that 
were significantly larger than the stochastic component. Histograms 
and empirical cumulative distribution functions for the filtered data 
were examined, as well as central moment estimates. The filtered data 
appeared to be wide-sense stationary over short periods of time, typi- 
cally 1 second. Based largely on quantile-quantile plots, it was con- 
cluded that, although close to gaussian, the filtered data for three out 
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of the five lines were distinctly nongaussian; the filtered data for the 
remaining two lines appeared to be gaussian. 

The model-building stage involved fitting the filtered data to two 
classes of models. The first class of models is highly unstructured; it 
was based on stable distributions, infinitely divisible distributions that 
arise from the central limit theorem. Based on a series of parameter 
estimation procedures including robust estimation, maximum likeli- 
hood estimation, and quantile-quantile plots and backed up by a 
likelihood ratio test on the goodness-of-fit, the three nongaussian lines 
could be adequately modeled by a stable distribution with characteris- 
tic index =1.95 (a gaussian has characteristic 2.0). 

The second class of models is much more structured than the first. 
Two different physical processes were assumed to contribute to the 
filtered data: the low- variance component was a stationary gaussian 
process, while the high-variance component was a filtered Poisson 
process. Parameters for the gaussian component were estimated using 
trimmed means and trimmed variances. The parameters specifying 
the filtered Poisson process were much more complicated to estimate. 
The instants of time at which noise bursts occurred and the intervals 
between bursts were first examined; based on power spectra as well 
as covariance estimates, the intervals appeared to come from a renewal 
process. Histograms and empirical cumulative distribution functions 
indicated that the time intervals came from a Poisson process ; empirical 
survivor and hazard function plots showed that a Poisson process with 
constant rate parameter was not an adequate model, however. Because 
of the small number of bursts observed, it was quite difficult to fit the 
time intervals to a Poisson process with varying rate parameter, and 
for expediency a constant Poisson rate parameter was chosen to model 
noise burst times of occurrence. The amplitudes of the noise bursts 
were adequately modeled by a log normal and power Rayleigh, or 
generalized gamma. The durations of actual noise bursts were used to 
estimate parameters in the noise burst shaping filter. A simple indica- 
tion is presented of how well the filtered data fit the gaussian-plus- 
filtered-Poisson-process model. A number of other models and exten- 
sions of these models are discussed. 

II. EXPLORATORY DATA ANALYSIS 
2.1 Description of the data 

The data, supplied by J. Fennick, consist of analog tape recordings 
of noise on five telephone lines. Figure 1 is a diagram of the measure- 
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Fig. 1 — Telephone noise measurement system. 

ment system. Table I describes the origin and destination of each line, 
as well as the date and time of the recording. 

The origin of the telephone line was terminated with the nominal 
characteristic impedance of the line, 600 ohms. The output of the line 
was low-pass filtered to remove spurious out-of-band signals, amplified, 
and recorded on an analog tape at 18.75 cm per second. The system 
was calibrated before each recording with a 15-second burst of a 
325-Hz sinusoid at a predetermined amplitude. The dynamic range of 
the recording system was approximately 100. 20 No attempt was made 
to eliminate dc offset and drift. Each recording was approximately 
15 minutes long. 

Figure 2 is a block diagram of the analog-to-digital tape conversion 
system. The analog tapes were played back on an analog tape recorder 
(of a different model than that on which they were recorded) at 18.75 
cm per second. The calibration signal set the gain on the playback 
amplifier so that the calibration signal amplitude was roughly equal to 
its value at the recording site. There was no attempt to remove wow 
and flutter in the tape recording. 21 The signal was low-pass filtered (to 
lessen the chance for spectral aliasing), amplified, sampled 10,000 times 
a second, passed through an analog-to-digital converter, and subse- 
quently put into digital format on a tape suitable for further processing 

Table I — Description of data 



Line 


Origin 


Destination 


Approxi- 
mate 
Length 
(km) 


Date 


Time 


1 
2 
3 
4 
5 


Monroe, Mich. 
Saginaw, Mich. 
St. Louis, Mo. 
Little Rock, Ark. 
Newark, N. J. 


Detroit, Mich. 
Detroit, Mich. 
Ft. Smith, Ark. 
Ft. Smith, Ark. 
Binghamton, N. Y. 


55 
160 
690 
250 
400 


7/8/64 
7/8/64 
8/4/64 
8/4/64 
12/12/63 


2:06p.m. 

2:49p.m. 
10:30 a.m. 

1:55 p.m. 
12:52 p.m. 
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Fig. 2 — Analog-to-digital conversion of telephone noise data. 

by a Honeywell 6070 digital computer. The levels of the analog-to- 
digital converter will be the units specifying the amplitudes of the 
telephone noise data. No measures of the degradation in data on 
digital tapes resulting from jitter during the sampling process are 
available; it is assumed to be negligible compared to other sources of 
measurement error. No bounds are available on the loss of information 
entailed by examining a continuous-time random process at only 
discrete instants of time. 22 

Two critical remarks concerning the method of recording the data 
should be made : first, there are no quantitative measures available on 
the amount of noise introduced into the data by the measurement 
system alone. Presumably, any measurement system noise was in- 
significant compared to the telephone channel noise. Second, the 
dynamic range of the recording system is probably insufficient to 
faithfully record telephone noise; a much more satisfactory dynamic 
range would be 1000 to 10,000. Both issues have been dealt with else- 
where (in a different context) ; a possible solution would be to convert 
the data into digital format directly at the recording site. 23 Considera- 
tion of these problems is left to future research; the data analysis 
proceeded with these caveats in mind. 

Figure 3 shows a typical telephone channel noise waveform from 
line 1 after conversion to digital format. 

How typical are these data compared with that observed on other 
telephone lines? A search of the literature as well as private communica- 
tions from engineers shows that the data discussed here appear to be 
typical of telephone channel noise. Throughout this investigation, 
nothing was uncovered that contradicted earlier work; rather, this 
work tends to clarify and place in perspective that of earlier investiga- 
tors. Note too that the telephone lines examined here were typically 
several hundred miles long, presumably passing through a variety of 
equipment, and hence quite representative of telephone noise. Finally, 
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Fig. 3 — Typical line 1 telephone noise waveform (unfiltered). 

the analog tape recordings were played into loudspeakers, and the 
authors felt the noise sounded typical. 

A much more serious objection to the present analysis is that not 
enough of the data at hand was examined. If all five 15-minute noise 
recordings were sampled 10,000 times a second and then put on to 
tape in digital format, more than 45 million noise data must be ana- 
lyzed ; in particular, 9 million data must be processed and statistically 
characterized for just one telephone line sample function. In practice, 
only 10-second segments from the beginning and middle of a recording 
were examined in detail and compared with each other. No unusual 
statistical differences were observed between these segments for any 
telephone line. The main reason for examining so little data was the 
great cost of analyzing these data statistically and, in particular, of 
digitally filtering the data to remove sinusoids. It is difficult to predict 
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Fig. 4 — Typical line 1 telephone noise waveform (filtered). 

in advance exactly which sinusoids are present on a particular telephone 
line; it is easier to filter these out digitally after the measurement 
without distortion than to accomplish this with analog niters. 

2.2 Data modifications and estimation of power density spectra 

As mentioned above, the data went through several stages before 
they were available in digital form. Some further processing is necessary 
to remove the effects of this pre-processing, as well as to remove un- 
wanted sinusoids. Since the frequency response of each line was un- 
known, nothing was done to compensate for it. 

The first step is to compute estimates of the power spectrum. The 
data were segmented into blocks (typically of length 1000, correspond- 
ing to to of a second of noise). Each block was tapered and enlarged 
to 1024 values by adding zeros, then transformed into the frequency 
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domain by the fast Fourier transform (FFT) ; 2 * the power spectrum 
was estimated from the transformed data. The method used is com- 
parable in bias and variance to other spectral estimation procedures, 
but requires considerably less computer time than other non-FFT- 
based methods. 25-27 Furthermore, it is possible to calculate cross- 
spectrum estimates easily, as well as to check for nonstationarity by 
computing the spectra of successive segments of data. The discrete 
Fourier transform of N successive data at frequency wi = l/NAt is 
fM, 

Km) = t x n K e-^ Kl > N I - 0, 1, ■ • ., [ ^ 1 , 



where 
and 



uk = sample of noise waveform at time KAt 



At = time interval between samples. 
The estimate of the power spectrum density at frequency Wi is S(wi), 



At M 
*» j = — M 



where 



M 

Z. 9i = 1, 

j M 



and the weights ( gr,- } are introduced to smooth the estimate. Unequal 
weights can be used to lower the bias of the estimate, but increase 
its variance. The value S(wi) represents the average noise power 
density in a frequency band centered at w t . All power spectrum density 
estimates shown here were computed with g, equal to (M — | j | )/M 2 , 
where M = 5. Figure 5 shows the power spectrum for the waveform in 
Fig. 3 with two sharp peaks probably reflecting sinusoids at 650 and 
4300 Hz. Figure 6 shows a succession of 24 power spectrum estimates 
for line 1 for T&-second segments of filtered data. The first 13 are from 
successive segments recorded at the beginning, while the final 11 are 
from successive segments recorded 5 minutes later. These results 
indicate that line 1 data can be regarded as wide-sense stationary over 
at least 1-second time intervals. The variance of these spectral estimates 
is unknown; if the process were gaussian, the distribution of S(wi) can 
be approximated by a X 2 distribution with [2/^JL M g]~\ degrees of 
freedom. 28 For g t = (M — \j\ )/M 2 with M = 5, this results in ap- 
proximately 14 degrees of freedom. From the data shown here for line 
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Fig. 5 — Typical power spectrum for unfiltered line 1 data (N = 1000). 

1, as well as data from the other four lines, telephone noise power 
density spectra appear to have the same shape, but different scales. 
Table II summarizes the estimates of frequencies of signals which 
were quite probably sinusoids, and whose estimated power spectrum 
density was at least a factor of 10 above the estimated wideband power 

Table II — Estimated frequencies of sinusoids that were 
subsequently filtered out 



Line 


Estimated Frequency of Sinusoid (Hz) 


1 
2 
3 
4 
5 


650, 4300 
650, 4300 

3900 

2000, 3600 
60 
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Fig. 6 — Line 1 power spectra: bottom 13 from beginning of line 1 data, top 11 from 
middle of line 1 data (N = 1000). 

spectrum density. While this criterion is arbitrary, independent experi- 
mental evidence to be discussed shortly indicates it is adequate from a 
statistical point of view. 

Since many statistical tests require uncorrelated samples, it is neces- 
sary to filter out these sinusoids, as well as to compensate for distor- 
tions in the data from the measurement system. This implicitly assumes 
that telephone noise can be modeled as the sum of a deterministic 
process, sinusoids at various frequencies, and a purely stochastic 
process, which will be characterized in greater detail. This was ac- 
complished using low-pass, band-stop, and high-pass linear-phase 
digital filters designed by computer programs developed by L. Rabiner ; 
the filtering was carried out in the discrete time domain by convolution. 
Figure 7 shows the power spectrum of a filtered segment of the data 
shown in Fig. 4. 
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Fig. 7 — Typical power spectrum for filtered line 1 data. 

Because of the difficulty in finding telephone lines completely free 
from sinusoidal interference, the question arises as to how much 
harmonic content can be tolerated in performing various statistical 
tests. Work carried out elsewhere in a different context has examined 
this issue from an experimental viewpoint ; 29 the principal findings were 
that the amplitude statistics are apparently not significantly degraded 
by the linear filtering, if the sinusoid is the same size or smaller than 
the observed noise levels. This topic can be a subject for future research. 

2.3 Covariance estimation 

It is assumed in many statistical computations that the data are 
statistically independent. In practice, the data usually depend to some 
extent on each other, and it is often quite difficult to quantitatively 
estimate the effects of this lack of independence. One indication of 
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Fig. 8 — Typical line 1 autocorrelation function (N — 1000). 
independence is the estimated autocorrelation function, 

&nn(lAt) = g^— • I N f l n'(KAt)n'(KM + lAt) 1=1,-- 
where 



,N-l, 



and 



n'(KAt) = n(KAt) - if L n(ZA«) 
iv j=i 



*n„(0) = ^ L n"(XA«). 
A* a:=i 



A typical autocorrelation of filtered data is plotted in Fig. 8. A sinusoid 
that was not filtered out is quite evident at approximately 1400 Hz 
(see also Fig. 7) ; ignoring this sinusoid, 29 the autocorrelation appears 
to be approximately zero for I ^ 3. 
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If the data are wide-sense stationary and ergodic, then the autocor- 
relation and the power density spectrum are a Fourier transform pair. 30 

Examination of the filtered waveform in Fig. 4 indicates that the 
samples appear uncorrelated, i.e., they are scattered in a random 
fashion about a location parameter. 

The sample normalized cross covariance for two different segments of 
data, {x(At), x(2At), •••, x(NAt)} and {y(At), y(2M), ■■■, y(NAt)}, 
is defined as 

~ N f l x'{KAt)y'{KAt + lAt) 



where 



x'(KAt) = x(KAt) - i £ x(lAt), 



N 



1=1 



and 



y'(KAt) = y(KAt)-±ZvW), 

R xx (0) = 4 £ x»(KAt), 
Jy k=i 



Ry U (0) = ± E y"(KAt), 

*™ K. =1 



and is shown in Fig. 9 for two typical segments of filtered data. From 
this as well as other data, the filtered telephone noise data examined 
appear to be uncorrelated over short time intervals. 

Since the data, after filtering, appear approximately uncorrelated, 
they will now be characterized in greater detail. 

2.4 First-order filtered data amplitude statistics 

A nonparametric statistical description of first-order noise amplitude 
statistics provides a great deal of useful information. For example, if 
the data are independent identically distributed random variables, 
they can be completely characterized by their empirical cumulative 
distribution function. 31 The work in this section relies heavily on 
graphical methods for data analysis, to give more physical insight 
into the nature of the data. 32 

The empirical or sample cumulative distribution function is defined 
as 

fi,y} 4 number of observations less than or equal to X 
total number of observations 
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TIME 



Fig. 9 — Typical line 1 crosscorrelation function (iV = 1000). 



which is a function of \xk], the set of observations. The sample 
histogram is defined as 

P(X, X + A) = number of sample values in [X, X + A] , 

where A is the bin width. Figure 10 is a plot of a typical empirical 
cumulative distribution function, and Fig. 11 shows a typical sample 
histogram. These two figures imply that the first-order probability 
density for the data is roughly bell-shaped and symmetric. A simple 
graphical symmetry check on the empirical cumulative distribution is 
shown in Fig. 12; x K is plotted against xn-k+\, where K = 1, 2, • • •, 
[N/22, N = 1000 is the total number of observations, and [xk] is the 
set of ordered observations. If the empirical cumulative distribution 
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Fig. 10 — Typical line 1 empirical cumulative distribution function (N = 1000). 

is symmetric, these points lie on a straight line with negative unit 
slope ; this is apparently the case. 

The next quantities of interest are central moment estimates, which 
are defined as follows : 33 

1 N 

x = sample mean = -^ Y. %j, 



and 



s 2 = sample variance = t? Z! (X, — x) 2 , 



N 



d 3 = sample skewness = Tr 51 (xj — x) 3 /(s 2 ) i , 

JS y = i 



N 



di = sample kurtosis = ^ H (xj — a;)V(s 2 ) 2 . 

iV j = \ 
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Fig. 11— Typical line 1 histogram (N = 1000). 

These parameters were estimated for ten segments of 1000 data for 
each of the five lines. Table III shows these estimates for the segment 
of each line whose fourth central moment was the median of all ten 
fourth-central-moment estimates of this line. The 5-percent significance 
level for 1000 independent identically distributed gaussian random 
variables with known mean and variance are 33 

-0.127 <d 3 < 0.127 
2.76 < a 4 < 3.26. 

Figure 13 shows a scatter plot of d 3 vs. a 4 for successive segments of 
1000 data for each of the five lines. Based on this evidence, it can be 
conjectured that lines 1, 2, and 4 are nongaussian, while the gaussian 
hypothesis cannot be rejected for lines 3 and 5. Since quite a large 
body of literature exists on gaussian random processes and these 
random processes are well understood, the gaussian hypothesis is not 
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Fig. 12 — Typical symmetry check on line 1 
function (N = 1000). 



empirical cumulative distribution 



lightly discarded: the evidence that the data are nongaussian should 
be much more convincing than that presented so far. 

A very convenient graphical method to check how well data fit a 
theoretical distribution function is the quantile-quantile, or Q-Q, plot. 32 
The gth quantile of a cumulative distribution function F(x) is defined 
here as the value x for which F(x) = q, ^ q ^ 1. A Q-Q plot plots 

Table III — Estimated telephone noise central moments 



Line 


£ 


s 2 


d 3 


d t 


1 


-87.9 


38,600 


0.05 


3.4 


2 


-80.1 


18,200 


0.08 


3.5 


3 


-80.1 


44,200 


-0.05 


3.1 


4 


-82.3 


87,200 


0.02 


3.3 


5 


- 1.06 


1,990 


-0.08 


2.9 
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Fig. 13 — Scatter plot of estimated third central moment d 3 vs estimated fourth 

central moment d 4 for at least ten successive segments of 1000 data (N = 1000) for 
all five lines. 



quantiles of the empirical cumulative distribution function against 
quantiles of the theoretical distribution. If the empirical and theoretical 
distribution functions are the same, the plot is a straight line with 
slope +1 passing through the origin. If the empirical and theoretical 
distribution functions are the same to within a linear transformation 
(i.e., to within a scale and location parameter) the plot is still a 
straight line. A typical quantile-quantile plot for line 1 filtered data 
against a gaussian distribution is shown in Fig. 14; the sample size 
was 13,000. The first 100 and last 100 quantiles, as well as every 
hundredth quantile in the middle, have been plotted, giving the 
illusion of discontinuity during the transition from the middle to the 
tail quantiles. 32 Ten observations in each tail are widely scattered. 
Figure 15 shows the central portion of the quantile-quantile plot with 
these observations excluded. The tails curve toward horizontal lines, 
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Fig. 14 — Q-Q plot for 13,000 line 1 data against gaussian distribution (X = sample 
mean, S 2 — sample variance). 

another indication of the long-tailed nongaussian nature of the data. 
The 10 points on each tail were found to be highly correlated : These 
very large excursions occurred in clumps of two, three, and five at a 
time, violating the assumption that the data are independent. For 
comparison, Fig. 16 shows a quantile-quantile plot for line 5 filtered 
data against a gaussian distribution; the sample size was 11,000. The 
straight line is a good indication that these data are gaussian. 

III. MODELS 

3.1 Central limit theorem 

Since noise on telephone lines is presumably due to a large number of 
independent causes, it is worthwhile to digress and review the central 
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limit theorem. The material presented here is largely tutorial, following 
standard references. 3435 The close association between the central limit 
theorem and the gaussian distribution is remarkable because of its 
algebraic closure property : If two independent random variables are 
both gaussian, their sum is also. It is much less widely known that the 
gaussian distribution is a special case of a larger family of distribu- 
tions, which arise from the central limit theorem and exhibit the same 
closure properties as the gaussian, the stable distributions. 

The reason for the importance of the gaussian rather than the 
entire stable distribution family is that only the gaussian distribution 
has a finite variance, and infinite variance is felt to be physically 
inappropriate in virtually any context. However, this is naive in that 
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Fig. 16- — Q-Q plot for 11,000 line 5 data against gaussian distribution (X = sample 
mean, S 2 = sample variance). 



the gaussian distribution is unbounded and unbounded quantities are 
also felt to be physically inappropriate. The gaussian distribution 
may describe a particular situation adequately over a certain range ; an 
infinite-variance distribution may model an actual situation over a 
greater range of a parameter. Both distributions may be physically 
inappropriate, but the infinite variance may, in this sense, account for 
the observations better than the gaussian. 

Mandelbrot has pioneered in developing and popularizing this 
notion, 36-3 ' for example, in connection with economic phenomena 39-42 
and in error statistics of digital signals transmitted over telephone 
lines. 9 Consider, for example, the distribution of changes in stock 
market prices. Mandelbrot 39 and Fama 43 have shown that, although 



1284 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1974 



the change in stock market prices is bounded, the probability of very 
large deviations is so great that many statistical techniques that assume 
an underlying distribution with finite variance are not applicable. 
Stock market prices may be modeled as a sum of a large number of 
random variables ; similarly, at any instant of time, telephone noise is 
presumably the sum of a large number of random variables. The sum 
of a large number of infinite-variance variables is often dominated by 
one or a few of the summands 44 — a theoretical property of infinite- 
variance distributions. The key feature common to these models is 
that the limiting distribution remains the same if an arbitrary but 
finite number of terms are dropped from the sum. This intuitive notion 
can be made precise and, subject to a mild restriction on the distribu- 
tion from which the summands are drawn, leads naturally to the central 
limit theorem. 45 

Among infinite-variance distributions, the stable distributions play 
an important role, because only stable distributions can be limiting 
distributions of suitably normalized sums of independent identically 
distributed random variables, as well as because stable distributions 
are closed under convolution. Some pioneering work on the statistical 
analysis of data from a stable distribution has been carried out already ; 
the analysis described here is a straightforward application of this 
work. 46-48 Before detailing that work, a summary is presented of some 
properties of stable distributions. 

A distribution function P(x) is called stable if, for all a± > 0, &i, 

a 2 > 0, 62, there exist constants a > 0, b such that 

P(aix + 6i)*P(a 8 a: + 6 2 ) = P(ax + 6) 

holds. 49 Every stable distribution has a continuous density ; the stable 
distributions discussed in this work have unimodal densities that are 
analytic throughout their support. 50 The random variable x is stable 
if and only if the logarithm of its characteristic function is 51 

In [*(«*■")] =lnO,(w)] 

— |ct«| a [l — 10 • sign (w) tan (ira/2)'] + iSw a j* 1 

— I cw I [1 — i/32/7r sign (w) In | cw | ] + ihw a = 1 

- 1 £ £ 1 < a ^ 2. 

Thus, every stable law is described by four parameters a, /3, c (or 
7 = c a ), 8, where a is the characteristic index, /3 is associated with the 
skewness of the distribution, c is a scale parameter, and 5 is a location 
parameter. If /3 = 0, the distribution is symmetric about x = 5. If 
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j8 > and < a < 2, the distribution is skewed to the right, and the 
degree of skewness increases as increases; conversely, if < 0, the 
distribution is skewed to the left and the degree of skewness increases 
as /3 decreases. For a = 2, is irrelevant. 52 

If s„ is the suitably normalized sum of n independent identically 
distributed random variables xi, Xi, • • •, x n , 

S n = ^- (Xl + X 2 + ■ • • + X n ) — A n 
tin 

where B„ and A n are normalizing constants, then the distribution of x 
is said to belong to the domain of attraction of a stable distribution 
with characteristic index a if the distribution of s„ converges to this 
stable law as n goes to infinity ; 53 this distribution belongs to the domain 
of partial attraction of a stable distribution if the distribution of s„ 
converges only for some subsequence. 54 A stable distribution with index 
a has absolute moments of all orders strictly less than a, i.e., 
£[|x| p ] < °° forO ^ p < a. 55 

Stable distributions and densities can be expressed as a power series. 56 
In several cases, this series can be considerably simplified to yield 
analytic closed-form expressions; these cases are a = 2 (gaussian), 
a = 1 and = (Cauchy), and a = \ with j8 = ±1. Figure 17 depicts 
several stable density functions. 

If xi, x%, • • • , x n , • • • , are independent random variables drawn from 
r distributions, each within the domain of attraction of stable laws with 
indices drawn from the finite set (on, • • -, a T ), then under certain con- 
ditions on the number of representatives of each distribution, the 
suitably normalized sum of these variables converges to a distribution 
that is the convolution of r stable distributions. 57 - 58 

The question of rate of approach to the limiting distribution of a 
sum of suitably normalized, independent, identically distributed 
random variables is well understood if the limiting distribution is 
gaussian (a = 2). 59-61 If the limiting distribution is in the domain of 
attraction of a stable distribution, a variety of results are avail- 
able. 30 - 35 - 43 ' 47 The most useful result 62 available at present, from a data 
analysis point of view (see Ref. 63), loosely states that the difference 
between the actual distribution of the sum of N suitably normalized 
random variables and the limiting stable distribution (0 < a < 2) is 
bounded by a linear combination of terms of the order of N~ lla and 
jy-(2-o)/a a s an example, consider the case a = 1.9: one term is 
N -v a = jy-o.53 wn ji e t h e other term is N-v- a) ' a - N~°- ob3 ; N must be 
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Fig. 17 — Some symmetric stable first-order probability density function. 

astronomically big to reduce this second term to a value smaller than 
0.1, which indicates how slow this rate of convergence to a limiting 
stable distribution may be. 63 Thus, in many practical situations, cau- 
tion must be shown in going to the limiting distribution. 64-69 Noise 
on telephone lines is possibly a case in point. 

Section 3.2 discusses how filtered data from the three nongaussian 
telephone lines are fit to stable distributions. Since these distributions 
have no second moments, the modifications necessary to properly 
interpret power spectra and covariance estimates, as well as auto- and 
crosscorrelation estimates, for these three lines are not clear. This 
whole area must be subject to further research. 38 

As a final aside, the question of ergodicity, of relating time average 
statistics to ensemble average statistics, will not be addressed here: 
The filtered data are assumed to be an ergodic random process. 
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3.2 Symmetric stable distribution model 

In this section, various statistical tests are described for determining 
if telephone noise on the three lines that appear nongaussian can be 
adequately modeled by a symmetric stable distribution (0 < a ^ 2, 
= 0). 

A series of estimators for symmetric stable distribution (1 < a < 2) 
parameters have recently been developed. 47,48 These estimators are 
based on statistics easily derived from the empirical distribution func- 
tion; they have been compared with maximum likelihood estimates 
and found to offer reasonable agreement when suitable precautions, 
such as a large sample size for a near 2, are taken. 46 These parameter 
estimates are 

1 (0.5 +p) 

5 - HXT E xk V = 0125, 0.250, 0.375 

2,J\p K = (0.5-p) 

1 (~ ~ \ 

C = 1 654 ^ ~~ X °- 2S )> 

where x T is the value of the rth empirical quantile, $ is a trimmed mean, 
and 6 measures the spread of the distribution. To estimate the charac- 
teristic index a, an auxiliary variable z q is first computed 

z q = * 9 ~ 5 1 "' = 0.827 f" ~ ^- g 

&C #0.72 — #0.28 

q = 0.9995, 0.995, 0.99, 0.985, 0.98, 0.975, 0.97, 0.96, 0.95, 0.94, 0.92, 

and then d is obtained as a function of z q from tables in Ref . 48. & is a 
measure of how rapidly the distribution approaches its asymptotic 
values. For line 1, with a sample size of 13,000, it was found that 

5 = -87.4, 
c = 132.0, 

^(20.99) = <2(Z0.995) = 1.95. 

In addition, this was carried out for a sample size of 1000 thirteen 
times, a sample size of 2000 six times, a sample size of 3000 four times, 
a sample size of 4000 three times, and a sample size of 5000 two times. 
The results are tabulated in Table IV for the q = 0.98 fractile. Different 
choices of q resulted in practically the same estimates. 

Larger and larger samples were used because, if the data really 
come from a stable distribution, then the parameter estimates would 
presumably converge to their true values with increasing N. 
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Table IV — Line 1 symmetric stable distribution 
parameter estimates 



N = 1000 


N = 2000 


N = 3000 


N = 4000 


N = 5000 


-88.6, -88.8, -86.9, -87.2 
; -88.2, -88.0, -87.6, -87.6 
° -88.3, -88.7, -86.6, -87.5 

-87.2 


-87.0, -88.1 
-87.6, -88.5 
-87.7, -88.7 


-88.1, -87.8 
-87.8, -88.0 


-87.9, -87.8 
-88.1 


-87.9, -88.0 


132.1, 127.4, 122.9, 124.7 
• 145.2, 134.5, 128.6, 119.8 
c 143.4, 134.5, 130.5, 124.2 

137.6 


123.7, 141.1 

124.0, 137.6 

135.1, 128.7 


125.8, 135.7 
129.1, 135.1 


125.5, 132.3 
135.9 


130.6, 132.1 


1.92, 1.94, 1.96, 1.90 

1.90, 1.89, 2.00 

a 1.90, 1.99, 2.00, 2.00 

1.91, 1.98 


1.93, 1.94, 1.92 
1.92, 2.00, 1.93 


1.94, 1.92, 1.93 
1.97 


1.93, 1.91 
1.97 


1.94, 1.96 



Figure 18 shows a Q-Q plot of 13,000 line 1 data against a symmetric 
stable distribution with a = 1.94, while Fig. 21 shows the same plot 
with 10 points on either end excluded. These points were excluded 
because they were possibly atypical observations, and because they 
were highly correlated to one another. Again, as in the gaussian Q-Q 
plots, only the first and last 100 empirical quantiles, as well as every 
one hundredth between have been plotted, giving the false illusion of 
discontinuity in the observations. The eye is quite sensitive to devia- 
tions from a straight line for quantile-quantile plots; in particular, 
a = 1.94, 1.95, 1.96 could easily be distinguished from one another 
(Figs. 18 to 23). The data appear to be slightly skewed, so a non- 
symmetric (|/3 1 « 1, ^ 0) stable distribution might indeed provide 
a better fit to the data. As a increases from 1.94 to 1.96, the stable 
distribution has shorter and shorter tails, and the points in the tails 
bending towards the vertical for a = 1.94 align with the rest of the 
data for increasing a. 

As a check on these estimates, W. DuMouchel has supplied the 
authors with a computer program that numerically calculates maxi- 
mum likelihood estimates of parameters of stable distributions, as well 
as of their covariances. 46 DuMouchel has shown the maximum likeli- 
hood parameter estimates are asymptotically normal, so that some 
statistical techniques developed for data analysis of gaussian samples 
can be brought to bear. 70 The method used is numerical, nonetheless, 
so two possible pitfalls must be kept in mind. 

(0 For ease in numerical calculations, the data were aggregated 

into bins, thereby losing information. 
(ii) A Newton-Raphson-type of algorithm was used that approxi- 
mates the first and second derivatives of the likelihood function 
with differences. 
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Fig. 18 — Line 1 Q-Q plot against a stable distribution, a = 1.94 (N = 13,000). 

DuMouchel 71 has observed that the first approximation is the more 
critical of the two. The second approximation was investigated using 
a simplex algorithm rather than Newton-Raphson which did not 
compute discrete approximations to derivatives, with results consistent 
to those now described. 

For line 1 data, with a sample size of 13,000, the numerical maximum 
likelihood stable distribution parameter estimates were 

A = 1.95 
$ = -0.006 
c = 132.7 

$ = -88.8. 
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Fig. 19— Line 1 Q-Q plot against a stable distribution, a = 1.95 (N = 13,000). 

The numerical approximation to the estimated parameter covariances 
are shown in Table V. 

The large variance of /3 compared to the other estimates has been 
observed by DuMouchel ; 46 the cause is unknown. Although Q-Q plots 

Table V — Parameter estimate covariances (x10 6 ) 



a 


c 





5 


410 


11 


-57 


-1 


11 


63 


-41 


-2 


-57 


-41 


13,551 


430 


-1 


-2 


430 


248 
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Fig. 20— Line 1 Q-Q plot against a stable distribution, a = 1.96 (N = 13,000). 

indicated a slight skewness, i.e., 13 < and |/3 1 <3C 1, the interpretation 
of the maximum likelihood estimate for /3 was obscured by this large 
variance. 

As a check on these results, maximum likelihood parameters of stable 
distributions were estimated for 78,750 filtered data from lines 1 and 2, 
corresponding to approximately 10 seconds of telephone noise. A 
Newton-Raphson-type algorithm was used; the parameter estimate 
covariances were comparable to those just discussed. The results were: 





& 


/3 


c 


5 


Line 1 
Line 2 


1.96 
1.94 


-0.0084 
-0.0014 


218.0 
93.5 


-81.35 
-80.30 
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Fig. 21 — Center portion of line 1 Q-Q plot against a stable distribution, a = 1.94 
(N = 12,980). 

Two other pieces of evidence that telephone noise can be fitted by 
a stable distribution are now presented : the studentized range test and 
the likelihood ratio test. These have been discussed elsewhere; 47,71 
for large amounts of data, caution is necessary to interpret the results 
of these tests properly. On the other hand, since the data here are 
apparently close to gaussian, large amounts of data must be examined 
to make clear the nongaussian nature of the noise. Thus, the results 
of these tests must be very carefully interpreted, and are included for 
the sake of completeness. 

For line 1 data, testing the gaussian hypothesis at a 0.5-percent 
significance level via the studentized range test for sample sizes of 
1000 led to mixed results: some segments of data fell within these 
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Fig. 22 — Center portion of line 1 Q-Q plot against a stable distribution, a = 1.95 
(N = 12,980). 



limits, others fell outside. However, for a sample size of 10,000 the 
result of the studentized range test clearly fell outside the confidence 
intervals. 

A likelihood ratio test was used to test 10,000 line 1 data at 1-percent 
significance levels against five hypotheses. The stable distribution 
hypothesis was rejected for a = 2.00,a; = 1.98,a = 1.90, anda = 1.85, 
but could not be rejected for a = 1.95. 

3.3 Other central-limit-theorem-based models 

What other models arise that might adequately account for the 
data, while having the same central-limit-theorem-based appeal as the 
stable distributions? First, it is possible the data examined he in a 
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Fig. 23 — Center portion of line 1 Q-Q plot against a stable distribution, a = 1.96 
(N = 12,980). 



domain of partial attraction of a stable distribution (which is wider 
than the domain of attraction 72 ). If enough data were examined, it 
might be possible that a would approach 2. A second possibility is to 
model the data as a convolution of r stable distributions, each with 
its own domain of attraction ; 73 presumably, each distribution could be 
attributed to a separate physical process. A third possibility is that the 
data are drawn randomly from m gaussian distributions, each with 
different mean and variance; for example, the data could be drawn 
from a low-variance gaussian a fraction P of the time, and from a 
high-variance gaussian a fraction (1 — P) of the time. A fourth pos- 
sibility is to model the data as a nonstationary gaussian random 
process, which is a special case of a nonstationary stable random 
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process, or by a doubly stochastic gaussian random process (where 
the mean and the variance are themselves random processes), which is 
a special case of a doubly stochastic stable random process (where all 
four parameters are themselves random processes). While these non- 
stationary and doubly stochastic models do not appear to be necessary 
to adequately model the data discussed here, over longer time intervals, 
such as days, weeks, months, or years, the simple models might be 
inadequate while these more complicated models might be more ap- 
propriate. Presumably, other classes of models exist. 

It is difficult to refute these alternative models offhand. Recall that 
the original goal was to find a mathematically tractable model for 
telephone noise; the model discussed here is simple and agrees intui- 
tively with the physics of the noise. The other models are more com- 
plicated. To be of practical use, however, they must be so oversimplified 
that the intuitive agreement with the physics of the noise is lost. It is 
hoped that the class of models based on stable distributions will lead 
to more insight into how telephone noise limits voice communication 
and data transmission and, more important, into new ways for combat- 
ing this noise. 

3.4 Gaussian-plus-filtered-Poisson-process model 

A model involving more parameters than the previous one is now 
investigated. This model assumes that telephone noise is due to a sum 
of two independent random processes. The low- variance part is assumed 
to be white and gaussian, while the high-variance process is assumed 
to be a filtered Poisson process. This type of model was popularized 
by Snyder, 74 and has been used in optical communication 75 ' 76 and ELF 
communication 77 to assess theoretically optimum and suboptimum 
receiver structures. It has intuitive physical appeal: for instance, 
the low-variance component can be attributed to thermal noise and 
electromagnetic crosstalk, while the high-variance component can be 
attributed to switch arcing and thunderstorms. It is convenient to 
view the filtered Poisson process as the output of a linear dynamical 
system, whose input is an impulse train; the area of the impulse is 
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Fig. 24 — Block diagram for generating a gaussian-plus-filtered-Poisson process. 
1296 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1974 



assumed to be a random variable, while the instants of time at which 
the impulses occur follow a Poisson distribution with rate or intensity 
parameter X. Figure 24 and the following equation summarize this 
discussion. 



n(t) = ■ 



g(t) + E a K h(t - t K ) N(t) > 
g(t) N(t) = 0, 



where 



n(t) = gaussian-plus-filtered-Poisson process, 
g (t) = stationary gaussian random process, 
ax = area of Kth impulse, 
h(t) = impulse response of linear dynamical system, 

t K = time at which the Kth impulse occurs, 
N(t) = number of impulses that occur in [0, i). 

To completely describe this model, the following parameters must 
be estimated 

(i) The mean and variance of the gaussian random process. 
(ii) The probability density function for the impulse areas. 
(Hi) The Poisson process rate parameter X. 
(iv) The linear system structure. 

We recall that the original motivation for this work was to stimulate 
interest of communication theorists in receiver structures that detect 
or estimate signals corrupted by nongaussian noise. One advantage of 
this type model is that parameters can be related to receiver perform- 
ance limitations as well as to physical causes of noise. This helps in 
determining how much effort should go into improving the receiver as 
opposed to reducing the noise (e.g., by designing switches to operate at 
lower voltages). One disadvantage of this type of model is its great 
analytical complexity; it may be quite difficult to find analytic per- 
formance limitations, and to determine how sensitive these limitations 
are to model parameters. 75 

If the impulse areas {a K } are assumed to be independent identically 
distributed random variables that are independent of the times the 
impulses occur, the characteristic function for the first-order probability 
density function can be shown to be 

£[ e .'<-n«>] = exp jimco - V" 2 + x /' LEa(e MT) ) - l]dr , 
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where 

co = frequency, 

m = mean of gaussian random process, 
a 1 = variance of gaussian random process, 
E a ( ) = expectation of ( ) with respect to the random variable a, 
the impulse area. 

It is quite difficult to analytically invert E(e iun{l) ) to find the probability 
density for n(t). This in turn means maximum likelihood parameter 
estimates, and Cram6r-Rao lower bounds on parameter estimate co- 
variances are difficult to calculate analytically. For this reason, 
numerical approximations must often be used. To avoid these problems, 
a suboptimum parameter estimation method was developed : Each 
parameter of the model was estimated by itself. There is no guarantee 
that these estimates, when put together, will be close to the true 
parameter values. The sole reason for doing this was to make the 
problem tractable. Evidence presented later indicates this method 
provides an excellent (but perhaps suboptimum) fit to the data. 

Although the dynamics of the linear system can be quite complicated, 
only three simple cases are considered here. 

(i) h(t) = Ae- At u-x(t) 

1 t> 



(m) h(t) = ( A *'^ 0)2 \ e~ At cos utu-^t) m_i(0 = 
(in) h(t) = ( ^^"^ e-A'smutu-iit), 



t < 



which are perhaps the cases of greatest practical interest. 77 

Assuming the amplitude burst statistics to be independent of the 
instants of time at which bursts occur, and assuming the gaussian 
process to be independent of the filtered Poisson process, the mean and 
variance can be calculated (Table VI) for the steady state noise. E(a) 
is assumed to be zero in all models presented here. This completes a 
general discussion of the gaussian-plus-filtered-Poisson-process model ; 
the methods used to estimate the model parameters are now described 
in detail. 

3.5 Gaussian random process parameter estimation 

HE (a) = 0, then E[n(t)~\ = m, and the sample mean is an unbiased 
estimate of the true mean of the gaussian process. If the data are 
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Table VI — Mean and variance for n(t) 



hit) 



E[»(03 



Variance Qn(0] 



Ae- At u-i{t) 

A' + u 2 _.. . ,.s 
2 e * « cos ul u_i (t) 

A* + 



e At sin utu-i(t) 



m + \E(a) 
m + \E(a) 
m + \E{a) 






° 2 + 7 



+I(^) , [a-TO]«* 



trimmed to exclude a fraction (e.g., 25 percent) of the data with largest 
absolute deviation from the sample mean, then presumably most 
values of n(t) that contain large contributions from the filtered Poisson 
process will be excluded. 

The estimates for the mean and variance of the gaussian process 
were consistent with estimates to be presented later for X, A, w, and 
E(a 2 ). No bounds are available on the bias or variance of these parame- 
ter estimates. The results are summarized in Table VII. The sample 
variance has been rescaled, based on the assumption that the data 
were drawn from a truncated gaussian distribution. 

3.6 Poisson process parameter estimation 

The Poisson process intensity is closely related to the times at 
which bursts of high-amplitude telephone noise occur. Many definitions 
of a noise burst are possible. The definition chosen here, although 
arbitrary, was found to be qualitatively insensitive to the parameters 
defining a burst. The absolute value of a zero mean waveform is shown 



Table VII — Gaussian random process trimmed mean and 

variance (Total data = 10,000, with a fraction p 

trimmed from either side) 



Line 


Truncated Sample Mean 


Rescaled Truncated 
Sample Variance 




p = 0.125 


0.250 


0.375 


p = 0.125 


0.250 


0.375 


1 
2 
3 
4 
5 


-88.0 
-79.5 
-77.6 
-81.1 
- 1.0 


-87.6 
-78.8 
-75.2 
-80.9 
- 1.0 


-87.3 
-78.7 
-73.6 
-81.0 
- 1.1 


35,414 
17,151 
49,306 
83,076 
1,664 


35,813 
17,058 
48,840 
88,430 
1,672 


35,208 
16,815 
46,554 
90,864 
1,683 
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in Fig. 25. The duration of the burst is the time difference between the 
moment that the absolute value of the waveform climbs above an 
upper threshold, T U pper, and the time that the absolute value of the 
waveform drops below a lower threshold, T\ 0W(ST) provided the waveform 
stays below the lower threshold for at least a predetermined period of 
time, called the guard band Gb, which separates one burst from the 
next. The instant of time a burst occurs, t mai , is the first instant at 
which the absolute value of the burst attains its maximum value, P ma x- 

A large number of statistics can characterize a point process. A 
number of nonparametric statistics were used to characterize the points 
in time at which bursts occur, and then, based on this evidence, the 
burst data were examined in greater detail to see if they could be 
adequately modeled by a renewal process in general, or a Poisson 
process in particular (e.g., see Ref. 78). 

The two statistics that were first examined were 

(i) The sample mean time between bursts as a function of Tapper, 
Slower, and Gb- 

(ii) The empirical cumulative distribution function and the histo- 
gram for time intervals between events as a function of T npptlI , 
Slower, and Gb- 

The effect on these statistics of variations of Tapper, Slower, and G B is 
now discussed. For line 1 data, for example, T\ OVIBI was fixed at 600 
(roughly three standard deviations from the sample mean), T nvxi6T was 
set at 600, and Gb was varied from 0.1 to 0.9 millisecond, in steps of 
0.2 millisecond. Tapper was then set at 800, and Gb was varied in an 
identical manner. Finally, Tapper was set at 1000, and Gb was again 
varied in the same fashion. The number of events observed was found 
to be insensitive to the choice of guard band as well as to Tapper. The 
guard band was therefore set at 0.5 millisecond, and T nvv6T was set 
equal to Tiower (which also avoids ambiguity in the meaning of 
threshold). 

A typical empirical cumulative distribution function and a histo- 
gram for the time intervals between bursts are shown in Figs. 26 and 
27, for Supper = Slower = 800. Typical histograms and empirical cumu- 
lative distribution functions for time intervals between bursts for 
Supper = Slower = 600 and Tapper = Slower = 1000 had the same 
shapes as those in Figs. 26 and 27. If the bursts were Poisson-dis- 
tributed, the distribution function would be completely specified by 
this information. 
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Fig. 25 — Definition of burst parameters. 
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Fig. 26 — Empirical cumulative distribution function for line 1 time intervals 
between bursts (N = 768). 
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Fig. 27 — Histogram for line 1 time intervals between bursts (iV = 768). 

Investigation of the burst statistics of the other two nongaussian 
lines yielded findings similar to those for line 1. 

Next, the second-order statistics of the time intervals were inves- 
tigated. Figure 28 shows a scatter plot of the (K + l)th interval 
against the Kih. interval. This plot shows that long intervals followed 
by long intervals are unlikely compared with long intervals followed 
by short intervals, or short intervals followed by long or short intervals. 
Note that it is still possible that the intervals are generated by a 
renewal process with a nonexponential distribution. Note also that 
approximately half the points plotted fall in the lower left corner 
square, < time* ^ 100 and < timex+i ^ 100. 

Another set of second-order statistics of interest is 

(i) The estimated autocorrelation of the time intervals between 
bursts 
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Fig. 28 — Scatter plot of time intervals between bursts for line 1 data (N = 768). 

^ N ±*(ti-i)(t l+K -i) l N 

*„(*)- '- 1 Rtti0) -, * = #£> 

where 

£«(0) = i £ (fc - *) 2 

iv x=i 

and 

ty = length of jth time interval. 

(it) The estimated power spectrum of the time intervals between 
bursts, which is the Fourier transform of R tt (K) if the process 
is wide-sense stationary and ergodic. 79 The same issues that 
were discussed earlier when the noise amplitude power spectrum 
was estimated are relevant here. 
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(Hi) Contingency tables, 80 which correspond to estimates of the 
second-order joint density of intervals that are a specified 
number of intervals apart from each other. 

(iv) The estimate of the conditional expectation of the length of an 
interval given the length of the interval j intervals earlier 



Elt K+] \tKl= J t K+ idp(tK+i\tK). 



The data analysis presented here focuses on the first two of these 
second-order statistics. Since only a small number of events was 
observed typically (e.g., 147 for T UPPBT = Ti ower = 1000, 768 for 
Supper = Slower = 800, for line 1 bursts), statistical fluctuations would 
have obscured the interpretation of the last two statistics. Figure 29 
shows a typical sample autocorrelation function for 1000 intervals, 
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Fig. 29 — Autocorrelation function for line 1 time intervals between bursts 
(AT = 1000). 
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Fig. 30 — Periodogram of line 1 time intervals between burets (N = 1000). 

while Fig. 30 shows the corresponding periodogram. Based on this 
evidence, it seems possible the bursts arise from a renewal process. 
To test this model, the so-called summed empirical periodogram 
S(lf ), defined as 



S(lfo) - £ 



\f(Kfo) 



where \f(Kf ) | 2 - periodogram at frequency Kf is plotted in Fig. 31, 
along with 5 percent significance limits to test the renewal hypothesis, 
according to which S should be straight. From this and other evidence 
not presented here, it appears that the bursts analyzed can be reason- 
ably modeled as a renewal process. 

Next, it is useful to characterize the renewal process model in greater 
detail. Such a process can be statistically described by a variety of 
measures. 81 The two considered here are 

(i) The survivor function S(t) = fraction of intervals greater than 
or equal to t. 
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Fig. 31 — Normalized cumulative periodogram for line 1 time intervals between 
burets, Tupp,, = T iomt = 800 (N = 768). 

(ii) The hazard function H(t) = P(t)/S(t), where P(t) is the prob- 
ability density function of the renewal process and S(t) is the 
survivor function. 

Note that H(t)-M is the probability of an event in an interval of 
length At seconds centered at t, which can be interpreted as the fraction 
of intervals in the range (t — A/2, t + A/2), given that the last event 
was t time units ago. 

The survivor and hazard functions are related by 82 

S(t) = exp T- J t H(x)dx\ = r P(x)dx. 

In a Poisson process with constant intensity X, these simplify to 

P(t) = Xe" x 'l 

S(t) = e~ u it>0. 

H(t) = X 
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Fig. 32 — Natural logarithm of empirical survivor function for line 1 time intervals 
between bursts (N = 768). 



In practice, only estimates of the survivor and hazard function are 
obtained. The empirical survivor function S(t) is the fraction of 
observed intervals greater than or equal to t. The empirical hazard 
function fi(t) equals P(t)/S(t), where P(t) is the fraction of observed 
intervals in the range [J - A/2, t + A/2]. Figures 32 and 33 show 
representative empirical survivor and hazard functions for bursts 
observed on line 1. Statistical fluctuations are quite apparent. For long 
time intervals, only one or two events fall in any particular bin, giving 
the appearance of a trend in 8(t) ; the survivor function is more stable 
for long intervals. The log survivor function roughly follows a series 
of straight-line segments with different slopes, indicating the process 
can be modeled as a Poisson process whose rate parameter is equal to 
the absolute value of the slope of the straight line approximations. 



TELEPHONE NOISE 1307 



0.04 



0.03 - 



<x 

z 
o 



0.02 - 



0.01 - 





T UPPER " 800-0 














T LOWER = 800.0 














Gg = 0.5 msec 














BIN WIDTH = 0.3 


msec 








• 
















- 


RECIPROCAL OF MEAN TIME 


INTERVAL 






/ 


BETWEEN 


BURSTS 










I 


























• 












• 




• 








• 


•. 


• * 












• 


• • • 

• .. | % • 


'•"• — r - -— 


'*"" I 


• 


m 


* • 
I 



10 20 30 

t = TIME INTERVAL BETWEEN BURSTS 
(sec/1000) 



40 



50 



Fig. 33 — Empirical hazard function for line 1 time intervals between bursts 
(N = 768). 



In order to adequately model the time intervals between bursts, 
two models more complicated than a simple Poisson process were 
investigated. The first model was a pth order autoregressive process, 
while the second was a doubly stochastic Poisson process, where the 
Poisson intensity A was a random variable specified by a two-state 
Markov process (see Ref. 8). The autoregressive model (for p = 5 and 
15) did not adequately account for long time intervals between events. 
The doubly stochastic Poisson process model did not adequately 
account for short time intervals followed by long time intervals. 
Therefore, both these models were dropped in favor of a Poisson process 
model with constant intensity, even though the log survivor function 
could not be approximated by a single straight line. Since this is only 
one of at least six model parameters, it was hoped the overall goodness 
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of fit of the model would not be seriously degraded ; the evidence pre- 
sented later indicates this might be true. 

3.7 Pulse amplitude statistics 

It was assumed the burst amplitude and the instant of time at which 
the burst occurred were independent random variables. Figures 34 and 
35 show representative empirical cumulative distribution functions and 
histograms for line 1 burst amplitude data. Based on these curves, a 
number of distributions can be fitted to the data; only two will be 
discussed here, a two-sided log normal and a two-sided power Rayleigh 77 
(also known as generalized gamma 83 ). A two-sided log normal random 
variable I equals Rq, where R and q are independent random variables, 
with q equally likely to be + 1 or - 1 and R defined by a log normal 



1.00 



0.80 



0.40 



0.20 



"*«*" 



T, 



600 



T LOWER ~ 600 

G B = 0.5 MILLISECOND 
1000 POINTS 



0.60 



0.80 



1.00 1.20 1.40 1.60 

ORDERED OBSERVATIONS (MULTIPLIED BY 10~ 3 ) 



2.00 



Fig 34 — Empirical cumulative distribution function of maximum burst amplitude 
(N = 1000). 
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Similarly, a two-sided power Rayleigh random variable p equals Rq, 
where R and g are independent random variables, with q equally 
likely to be +1 or - 1, while R is defined by a power Rayleigh density 

Pb(u) = ( J-- )(^rV lw '^-' < K ^ 2, ^ \u\ < oo. 

Each density has zero mean; the log normal variance is e i<r ' +m , while 
the power Rayleigh variance is RqT(1 -f- 2/K). 

The parameters of each distribution could be fit to empirical cumu- 
lative distribution functions such as that shown in Fig. 34. The range 
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of parameters for line 1 data, for example, was : 

log normal : mean 5 ^ m ^ 6 

variance 0.01 ^ tr 2 ^ 0.10 

power Rayleigh : scale factor 500 ^ R ^ 2000 
exponent 0.50 ^ K ^ 0.70. 

The investigation of a much larger amount of data probably would 
have narrowed the range of these estimates substantially. 

3.8 Linear dynamical system parameters 

The approach chosen here for estimating linear dynamical system 
parameters was trial and error. The sample mean burst duration was 
set equal to the damping constant .4" 1 in both the first- and second- 
order systems. The second-order system oscillates at frequency w, 
which was arbitrarily chosen as (£) (damping constant) -1 , to obtain 
qualitative agreement with actually observed noise bursts (e.g., Fig. 3). 
For line 1 bursts, for example, A~ l = 0.1 millisecond for T upP cr = T\ over 
= 600, G B = 0.5 millisecond. 

3.9 Goodness-of-fit to data of gaussian-plus-filtered-Poisson-process model 

Only one test was carried out to provide some heuristic measure of 
goodness-of-fit of this model to the data. The test was analogous to a 
quantile-quantile plot. Using typical parameter estimates such as those 
just described, a computer generated a sample function of a gaussian- 
plus-filtered-Poisson process. These artificial data were sorted and 
plotted against actual (sorted) telephone noise from line 1 as shown in 
Figs. 36 and 37. 

The reason for performing just one test is the great difficulty in 
expressing analytically the distribution function for the gaussian-plus- 
filtered-Poisson-process model. Hence, it is very difficult to perform 
quantile-quantile plots of the actual data versus model quantiles, as 
well as to find maximum likelihood parameter estimates and Cramer- 
Rao lower bounds on parameter estimate covariances. 

3.10 Criticism of the gaussian-plus-filtered-Poisson-process model 

Many criticisms of this statistical analysis are possible. First, the 
question of optimally choosing parameters was never addressed and is 
still open. Since a large number of parameters must be estimated, a 
series of presumably suboptimum but easy-to-calculate estimates 
appeared to be the only feasible course. 
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_Fig. 36 — Line 1 data vs gaussian-plus-nltered-Poisson-process data (N = 13,000) 
(X = sample mean, S 2 = sample variance). 

Second, it is not clear how to relate the linear dynamical system 
parameters in the model to actual telephone system parameters. Where 
does the filtering occur in reality? Why should it be linear and station- 
ary? Other evidence 19 suggests that the linear dynamical system pa- 
rameters are not as well denned for other telephone lines as for the 
data examined in the present work. 

Third, the time intervals between bursts are not adequately modeled 
over the entire observation by a Poisson process. Two other more 
complicated models were investigated in order to account for this. 
Many other models can still be investigated. 

Fourth, a more general class of models was never investigated that 
includes the gaussian-plus-filtered-Poisson process as a special case. 
This model, mentioned briefly earlier, is a mixture of a low-variance 
gaussian distribution and a high- variance gaussian distribution ; during 
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Fig. 37 — Center portion of line 1 data vs gaussian-plus-filtered-Poisson-process 
data (N = 12,980) (X = sample mean, S 2 = sample variance). 

a fraction P of the time, the low-variance gaussian distribution is 
chosen to model the data, while during the other (1 — P) fraction of 
time the high-variance gaussian distribution is chosen. The reasons 
for not investigating this class of models were that the gaussian-plus- 
filtered-Poisson-process model comes closer to describing intuitively 
the physical process of telephone noise generation, and that it has been 
used by communication theorists in other applications 74-76 more than 
a mixture of gaussians. 

IV. CONCLUSIONS 

This study has presented evidence that noise on some lines consists 
of a deterministic component (e.g., sinusoids at various frequencies) 
and a purely stochastic component. It was assumed that these com- 
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ponents add. The data analyzed here suggest that the stochastic 
component is stationary over short periods of time (typically 1 second) 
and distinctly nongaussian. Two simple models have been proposed 
for the nongaussian noise, one based on stable distributions, the other 
on a mixture of a gaussian process and a nongaussian-filtered Poisson 
process. Based on the data analyzed here, both models agree intuitively 
with the physical processes generating telephone noise and appear to 
fit the data reasonably well. 

It is hoped this work will stimulate further research in this area; 
since only a small number of telephone lines were examined, the models 
presented here await confirmation by independent investigators. Other 
models than those discussed here may indeed more adequately account 
for noise on telephone lines. It is hoped this work will lead to greater 
insight into how telephone noise limits voice communication and data 
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Fig. 38 — Computer-generated gaussian-plus-filtered-Poisson-process sample func- 
tion (same process parameters as in Figs. 36 and 37; X = sample mean, S 2 = sample 
variance). 
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Fig. 39 — Computer-generated symmetric stable random process sample function 
a = 1.9 (c = 100, 5 = 0). 

transmission and, more importantly, will lead to new methods for 
combating this noise. 
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APPENDIX A 

This appendix is included to give the reader some feeling for the 
two models discussed here. Using typical parameter estimates and a 
first-order system, a computer generated and plotted a discrete time 
sample function of a gaussian-plus-filtered-Poisson process (Fig. 38). 
For comparison, Fig. 39 shows a computer-generated discrete time 
series, where each point was drawn independently from a symmetric 
stable distribution (characteristic index a = 1.9). 

APPENDIX B 

A graphic indication that the data can be better modeled by a 
nongaussian rather than gaussian distribution is now presented. The 
motivation for this work is found in Mandelbrot. 84 

Estimates for the sample mean, as well as second, third, and fourth 
central moments can be calculated recursively for larger and larger 
amounts of data. Figure 40 plots these estimates for 13,000 filtered data 
(only every tenth estimate is plotted). Note the tendency for the 
second, third, and fourth central moment estimates to wander rather 
than stabilize as more and more data are included, as evidenced by 
the jumps in the estimates. The sample mean, however, does stabilize; 
note the small ripple in this estimate, which presumably is due to a 
sinusoid at approximately 600 Hz that was not filtered from the 
data (see Table II). 

These results are qualitatively consistent with results for central 
moment estimates of computer-generated stable random variables 
(10,000 independent identically distributed samples, a = 1.96, /3 = 0). 
Central moment estimates of computer-generated gaussian random 
variables (10,000 independent identically distributed samples) did 
stabilize at the correct values, while exhibiting no apparent jumps 
such as in Fig. 40. These results are also consistent with the gaussian- 
plus-filtered-Poisson-process model, with the large jumps in the esti- 
mates presumably a result of the filtered Poisson process. 

Fig. 40 — Sample mean as well as second, third, and fourth central momeDt esti- 
mates for line 1 (N = 13,000). 
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